SynBa: improved estimation of drug combination synergies with uncertainty quantification

Abstract Motivation There exists a range of different quantification frameworks to estimate the synergistic effect of drug combinations. The diversity and disagreement in estimates make it challenging to determine which combinations from a large drug screening should be proceeded with. Furthermore, the lack of accurate uncertainty quantification for those estimates precludes the choice of optimal drug combinations based on the most favourable synergistic effect. Results In this work, we propose SynBa, a flexible Bayesian approach to estimate the uncertainty of the synergistic efficacy and potency of drug combinations, so that actionable decisions can be derived from the model outputs. The actionability is enabled by incorporating the Hill equation into SynBa, so that the parameters representing the potency and the efficacy can be preserved. Existing knowledge may be conveniently inserted due to the flexibility of the prior, as shown by the empirical Beta prior defined for the normalized maximal inhibition. Through experiments on large combination screenings and comparison against benchmark methods, we show that SynBa provides improved accuracy of dose–response predictions and better-calibrated uncertainty estimation for the parameters and the predictions. Availability and implementation The code for SynBa is available at https://github.com/HaotingZhang1/SynBa. The datasets are publicly available (DOI of DREAM: 10.7303/syn4231880; DOI of the NCI-ALMANAC subset: 10.5281/zenodo.4135059).


Introduction
With the increased use of small molecule drugs in monotherapy and drug combination treatments, off-target toxicity and resistance to treatments are becoming clear challenges. The use of combinations of drugs offers a possible solution to reduce toxicity minimizing doses and bypassing resistance with alternative targeting. Thanks to the development of highthroughput approaches to screen drug sensitivity in cell lines, dose-response data for a large number of combinations have been made available. Examples include the AstraZeneca-Sanger DREAM challenge (DREAM) (Menden et al. 2019), NCI-ALMANAC (Holbeck et al. 2017), DrugComb (Zagidullin et al. 2019), and the screening data from the Wellcome Sanger Institute (Jaaks et al. 2022). The availability of these datasets makes it possible to predict the effect of drug combinations from modelling drug-sensitivity data, leveraging information from biological features of the cell lines and chemical characteristics of the drugs (Bulusu et al. 2016).
To understand how drugs can work synergistically when combined, models with a quantification framework are required. Traditional quantification frameworks usually involve a null surface, based on a number of set assumptions. The Bliss model is based on the Multiplicative Survival Principle (Bliss 1939), whereas the Loewe model is based on the Dose Equivalence Principle (Loewe 1953). In the past decade, parametric methods have started to emerge as alternatives to the above. These include MuSyC (Meyer et al. 2019;, BRAID (Twarog et al. 2016), and the Effective Dose model (Zimmer et al. 2016). All these frameworks are based on different assumptions and parameterizations. As a result of this, the outputs from these models often disagree. Although there have been efforts to unify the frameworks, this is still an open problem for the field.
To define a common framework for drug combination, we need a less ambiguous definition of 'synergy' (Tang et al. 2015). When a combination is said to be 'synergistic', it is unclear whether it implies that the combination is desirable in terms of its 'potency' or its 'efficacy'. Potency is the amount of dosage required for a drug to produce a specified effect, whereas efficacy is the degree of the beneficial effect produced by the drug (Meyer et al. 2019). A strong synergistic potency implies the toxicity may be reduced when the drugs are combined, which is crucial for avoiding overdose, whereas a strong synergistic efficacy implies that the combination increases the maximal possible effect. Both aspects are relevant for progressing with pre-clinical and subsequent clinical investigation. An ideal combination would be potent and effective. However, in clinical research, there are situations where a drug partner is requested only for enhancement of potency and not to increase efficacy. Until recently, in quantification frameworks including Bliss, Loewe, BRAID, and the Effective Dose Model, potency and efficacy are entangled within the concept of 'synergy' as defined in the traditional quantification frameworks. To tackle this problem, Meyer et al. (2019) developed MuSyC, a framework that decouples potency and efficacy following the principles of the generalized Hill equation. Although the MuSyC approach is effective for modelling both potency and efficacy, the model does not fully explore the challenge of model-based estimation of uncertainty. There are various sources of uncertainty associated with drug-sensitivity modelling. First, the biology of dose-response relationships is still unknown despite existing efforts. This leads to uncertainty associated with insufficient scientific knowledge. Second, there are systematic and random errors arising from the experimental procedures. Third, biological variation exists among cell lines of the same disease, resulting in a further source of noise. Finally, uncertainty also stems from limited information that can be extracted due to the small size of available data ('epistemic uncertainty').
Given these multiple sources of uncertainty, it is often impossible to reach an accurate estimate of the quantities of interest, e.g. potency of a monotherapy or the synergistic effect (in terms of either potency or efficacy) of a combination. Since the 'best' single deterministic estimates might not be reached, here we focus on accurately quantifying uncertainty in the model as deviation from the true estimate given the noise (e.g. the credible range of model parameter estimates/ predictions).
Most existing frameworks for drug combinations either do not compute the uncertainty or estimate it by standard error (Zimmer et al. 2016) or parametric bootstrap , each of which contains unrealistic assumptions. The standard error is only accurate when a large number of samples is available, which is not the case for most pharmacology datasets. Similarly, for the parametric bootstrap, an accurate uncertainty estimation relies on a sufficiently large number of observations to be reasonably close to the truth. In either case, the uncertainty estimation will often be inaccurate due to the small data size. For this reason, here we approach uncertainty estimation with a Bayesian framework, which incorporates the uncertainty by treating all parameters of interest as probabilistic quantities. This enables us to continuously model the uncertainty in our estimates as the number of measurements grows, without becoming over-confident. Moreover, the estimated quantities of interest are obtained simultaneously with their uncertainties, making this approach computationally efficient.
In the current literature, there are examples that use a probabilistic model to incorporate uncertainty in their outputs. Hand-GP (Shapovalova et al. 2021) is a non-parametric model based on the combination of the Hand model with Gaussian processes, providing more believable uncertainty estimation than MuSyC in some cases. However, Hand-GP does not incorporate the 1D Hill equation that imposed biological constraints useful for providing interpretable model outputs. For example, monotonicity of the monotherapy fitted curve is not enforced in Hand-GP, meaning it is possible that the model produces a dose-response surface that is unlikely to occur in the in vitro setting (Tansey et al. 2022). More importantly, due to the non-parametric structure of Hand-GP, the parameters describing potency and efficacy are lost. The bayesynergy model (Rønneberg et al. 2021) is a Bayesian framework that models synergistic interaction effects using Gaussian Processes, which provide uncertainty quantification. Although flexible, its formulation is based on the Bliss independence assumption that is biased against drug combinations with a moderate level of efficacy ). The bayesynergy model also does not separate out synergistic potency and efficacy.
Here, we design a flexible Bayesian framework to infer synergistic effects of drug combination (SynBa) where (i) the classic Hill equation is preserved to produce estimates of efficacy and potency (ii) the existing biological knowledge or insight from historical data may be conveniently added through the prior distribution over parameters in the model. In SynBa, we will use MuSyC as a baseline framework to decouple synergistic potency and efficacy and add probabilistic inference to provide outputs and their associated uncertainty. This is to design a framework estimating the most favourable scores and efficiently provide optimal candidates to the drug discovery pipeline with actionable decision-making criteria derived from the model outputs. SynBa is also the first synergy framework that simultaneously provides principled uncertainty estimation, preserves the Hill equation, and decouples the synergistic efficacy and potency.
With our approach, when a combination is predicted to be synergistic, a level of confidence will be quantified for this prediction, together with a level of improvement in efficacy or in potency. The associated uncertainty can guide decision on further laboratory experiments to proceed to the next phase. This will provide 'actionable metrics' for the subsequent stages of the drug discovery pipeline, which is an unmet need.

Methods
Our proposed method can be used for both analysing existing dose-response data and predicting unseen dose-response data for a given monotherapy D ¼ ðX; YÞ ¼ fðx i ; y i Þg or a given combination D ¼ ðX; YÞ ¼ fðx i ; y i Þg. The covariate can be a scalar x i (in monotherapy) or a vector x i (in combination) corresponding to the drug dosages. The response y i can be defined as cell growth or inhibition of growth, depending on how the data are collected. In this study, we focus on inhibitory datasets, where a large dosage typically results in growth inhibition. In this case, y i is defined as the percentage of growth-inhibited cells. Nevertheless, our method can be easily modified to accommodate the opposite setting where the drug response is enhancing growth with respect to the dosages. Figure 1A illustrates a typical dose-response matrix for a combination from a screening, where the first row and column contain monotherapy data and the remaining entries contain combination data. The core aim of our method is to  infer the synergistic potency and efficacy given such a matrix (or a vector in the case of monotherapy). To accomplish this, we designed SynBa, a Bayesian framework for the inference of Synergistic effects of drug combinations. SynBa is defined by a prior distribution pðhÞ for the parameters h and a likelihood function pðYjh; XÞ for the drug responses Y. The likelihood function describes the probability of the responses given the dosages X and the parameters h. The prior distribution encodes the existing belief or knowledge about the parameters. In monotherapy, we have h ¼ fE 0 ; E 1 ; C; H; rg, whereas in combination, h ¼ fE 0 ; E 1 ; E 2 ; E 3 ; C 1 ; C 2 ; H 1 ; H 2 ; a; rg. The likelihood function and parameters encode the shape of the dose-response curve (defined in Boxes 1 and 2 and described in more detail below). In addition, our method provides a way of predicting unseen dose-response data for both monotherapies and combinations. For example, in the case of (1) wherex is the untested dosage of interest,ỹ is the predicted response, and pðhjX; YÞ is the posterior distribution over the parameters h given training data (X, Y).

Overview of SynBa: Monotherapies
We begin by defining SynBa for monotherapy screens, where the dosage x is a scalar. The likelihood function for the response y is based on the Hill equation (Hill 1910), which has been the classic choice to model pharmacology data: where x is the dosage of the drug, E(x) is the corresponding measured response, and H controls the slope of the curve.
The interpretation of C depends on the problem and the dataset of interest. In this study, as the focus is on inhibitory datasets, C represents the dosage required to inhibit the given biological process or biological component by 50%, known as IC 50 . C quantifies the potency of the monotherapy in the study.
The base level of the monotherapy is denoted by E 0 :¼ Eð0Þ, which is the response when no drug is applied. The efficacy is quantified by E 1 :¼ Eð1Þ, or denoted as E inf , which is the maximal inhibition when a sufficiently large dosage x is applied.
Box 1 defines the prior distribution for the parameters and the likelihood model for the response given the dosages, which are also illustrated in Fig. 2.
To account for observational noise in the data, we define a noise model for y centred around E(x). For each fixed experiment setting (i.e. a fixed cell line treated by a fixed monotherapy or combination in the same laboratory environment), y is modelled as having Gaussian noise that is conditionally independent given the dosage: y $ N ðEðxÞ; r 2 Þ. The assumption of conditional independence is valid because in screenings such as DREAM, the measurements for each monotherapy or combination are performed independently in different plates, instead of being performed sequentially. Therefore, given a set of measurements for a monotherapy or a combination, the noise level of their corresponding responses is independent. The i.i.d. noise is illustrated as orange bell-shaped curves in Fig. 2. As a result, the likelihood function for y is defined as Equation (4).
The datasets are normalized by measuring cell inhibition when no drug is added (at dose zero). However, due to the noise in the biological process, the inhibition at dose zero would not be the same if the experiment is repeated multiple times. Thus, the normalization procedure itself contains uncertainty. Therefore, we define E 0 to be probabilistic instead of a fixed initial value. A Gaussian prior with mean B and variance 0:03B is given for E 0 , where B is the normalized inhibition of a dose zero, e.g. B ¼ 100 in DREAM. The variance of this prior is defined to be 0:03B so that it is flexible enough to allow for errors, but not too conservative.
The prior for the normalized maximal response (i.e. , with a range of ½0; 1) may be defined in various ways depending on whether to insert existing knowledge or historical information. One option is to remain uninformative and impose a uniform prior, as shown in Fig. 2A.
Alternatively, we may make use of the existing information from the monotherapy data available. According to the single agent datasets in DREAM, the empirical distribution of E inf has a high density on both extremes of the range, with 17.9% of them <0.05% and 15.1% equal to 1 (after normalizing to the interval ½0; 1). Using maximum likelihood estimation to fit a Betaða; bÞ distribution to these E inf values, we obtain a ¼ 0.46 and b ¼ 0.58. To account for this information, we define Betað0:46; 0:58Þ as the second option of the prior for the normalized maximal response, as illustrated in Fig. 2B. This prior is consistent with the biological behaviour that a drug administered with a sufficiently large dose will either kill the majority of the targeted cells if effective, or very few of them if not effective. The choice of this empirical prior shows how existing knowledge or information on the dynamic/kinetic of drugs can be conveniently added to SynBa through its priors.A uniform prior is imposed for log C (the logarithm of IC 50 ), with C bounded by d and M. The values of d and M depend on the dataset and the unit of the dosages. In this work, we define d to be smaller than any non-zero dosage in the dataset (i.e. 0 < d < minfxjx > 0; ðx; yÞ 2 Dg) and M to be larger than any dosage in the dataset (i.e.  (B) for the monotherapy prior model. The prior for log C is plotted underneath the x-axis, which is uniform in both options. The priors for E 0 (which is Gaussian in both options) and the normalized E 1 are plotted along the y-axis. Given E 0 , the normalized E 1 is uniform in Option (A) and follows a Beta(0.46, 0.58) distribution in Option (B). The 300 blue curves are random samples from the expected prior responses E½Y jh where h are sampled from the prior distributions defined in Box 1. The seven red points illustrate an example set of monotherapy dose-response data D. The black curve is a sample from the expected posterior responses E½Y jh; D after the model is fitted to the data D, while the orange bell-shaped curves illustrate the i.i.d. Gaussian noise for the responses.
M > maxfxjðx; yÞ 2 Dg). For example, d ¼ 10 À10 and M ¼ 10 6 is a viable choice for DREAM, whereas in NCI-ALMANAC, we may have d ¼ 10 À15 and M ¼ 10. The idea is to be uninformative about log C a priori, since the ideal dosage range for the experiments is unknown and often unsuitable, either too small or too large. It is common that IC 50 exceeds the maximum dosage. In other cases, IC 50 may lie between zero dosage and the smallest non-zero dosage, due to the tested dosages being too large. Our method includes these possibilities a priori.
H and r are both given a lognormalð0; 1Þ prior because they are both non-negative and assumed to be moderately small. In addition, previous literature indicated that H is approximately lognormal . A lognormalð0; 1Þ prior ensures that PðH < 5Þ % 0:95 and Pðr < 5Þ % 0:95 a priori.
The blue curves in Fig. 2 are 300 random samples from the expected prior responses E½Yjh, where h are sampled from the prior distributions. It can be observed that the curves cover a wide range of possibilities a priori. Yet, they are not excessively general, as all of them follow the Hill equation.
Note that the priors we have chosen are motivated by general knowledge and previous literature. When more knowledge exists for a specific combination, the prior can be conveniently adjusted to accommodate this, since the inference framework is agnostic to the choice of prior. Figure 3 is an illustrative example of SynBa trained on a monotherapy (the compound MTOR_1 treated on the cell line MDA-MB-231) with six measurements, taken from the DREAM dataset. The first row shows the resulting model with a uniform prior for the normalized E inf (i.e.Ẽ 1 ), whereas the second row shows the model with the Betað0:46; 0:58Þ prior forẼ 1 . We start with two measurements and add two additional responses each time. It can be observed that the posterior distribution for IC 50 narrows down quickly for both models because the observed responses span across the range between 40 and 100, which provides sufficient information to estimate IC 50 with low uncertainty. The posterior for E inf , on the other hand, is more uncertain, due to the dosage range being too small to observe the convergence of the responses. In this case, the two models provide a visibly different posterior distribution for E inf , due to the different priors. As shown in Fig. 3F, the maximum a posteriori probability estimate for E inf is 0 when the Beta prior is imposed, which results from the inserted prior knowledge that the response is likely to converge to 0 if an effective (but non-zero) response is already observed. This example shows how the prior design affects the inference of the parameter uncertainty. Nevertheless, if we look at the posterior predictive distribution for the responses (with samples illustrated by light blue curves), the two models reach a similar conclusion. We will show in Section 3 that the predictive performance of the model is insensitive to the choices of prior.

Overview of SynBa: Combinations
Extending SynBa to combinations of two drugs, the goal is now to model the dose-response surface Eðx 1 ; x 2 Þ ¼ f ðx 1 ; x 2 Þ where x 1 and x 2 are the dosages for the two drugs, whereas Eðx 1 ; x 2 Þ is the response, and f is some class of function to be defined.To define our likelihood model for the responses Y, Box 1. Overview of SynBa for the inference of monotherapy dose-response data.
The joint prior distribution for the response and the parameters of the curve given the dosages is pðY; E 0 ; E 1 ; C; H; rjxÞ ¼ pðYjE 0 ; E 1 ; C; H; r; xÞpðE 1 jE 0 ÞpðE 0 ÞpðCÞpðHÞpðrÞ; ( 3) where the likelihood model for the responses Y given the dosage x is defined as independently for all y 2 Y , where E 0 :¼ Eð0Þ; E 1 :¼ Eð1Þ, and r is the standard deviation of the noise level of y, and the priors are (5) x n are the dosages and d 2 ð0; x 2 Þ is a small non-zero value to avoid log C being undefined. i124 Zhang et al.
we take inspiration from MuSyC but maintain some flexibility on their model assumptions. The effect of a drug in a system is usually described by the Hill equation that describes the state of equilibrium of a reversible process between an unaffected population and an affected one (the principle of 'detailed balance'). To obey to this equation and its effects, in our model, we incorporate the assumptions of the principle of detailed balance, of the proliferation rate of unaffected population and of the saturation of the maximum effect of the drug in the affected population. In MuSyC, these assumptions are defined in a nested structure in which levels are called Tiers (see Supplementary Table S5 in Meyer et al. 2019). In defining SynBa for combinations, we adopt the same model assumptions as Tier 4 of the levels specified by MuSyC. This category encodes the most complex class of models that still maintains the assumption of detailed balance. It is worth noting that, conversely to MuSyC, our model posterior covers all four tiers simultaneously. This is because Tier 4 subsumes all lower tiers and thus for how our model is defined, they will not be eliminated from the posterior distribution, unless the evidence from the data is strongly against them. The concepts of Tiers as described in MuSyC would require a post-learning model selection. The use of a Bayesian approach avoids such a selection procedure whilst still maintaining biologically viable assumptions.
In contrast to MuSyC, in SynBa, we choose to maintain the detailed balance assumption to also avoid overparameterization, which is likely to occur due to the limited data size for each drug combination set. For example, the majority of the combinations in the NCI-ALMANAC dataset has a data size of 15 (excluding the base level at dose zero), only 3 more than the number of parameters in MuSyC. In a real-world scenario, the data size may often be even smaller. Furthermore, with the detailed balance assumption, matrix multiplication and inversion are avoided, which lowers the computational cost.
Box 2 defines the prior distribution for the parameters and the likelihood model for the response given the dosages. The definitions of the priors are a natural extension of the monotherapy model, with the same arguments being followed. The only new parameter is a, which follows a lognormal prior with median 1 because a is non-negative and equals 1 when the combination is neither synergistic nor antagonistic in terms of potency.

Inference of the synergy
After inferring the posterior for the parameters and their associated uncertainty, we focus on distinguishing the effect of efficacy and potency in drug combinations. MuSyC has defined metrics for both synergistic efficacy and synergistic potency, which is a promising step in decoupling potency and efficacy. However, the uncertainty for these two quantities has not been quantified systematically. Our model output includes not only quantification for the synergistic efficacy and the synergistic potency, but also a separate uncertainty estimation for each.
For the synergistic efficacy, one simple yet informative quantity is DHSA ¼ minðE 1 ; E 2 Þ À E 3 which is the change in the maximal effect between the combination and the more effective single drug of the two (Greco 1995). A positive score indicates synergistic efficacy. As E 1 , E 2 , and E 3 are probabilistic, the resulting DHSA score is also probabilistic. A metric such as Box 2. Overview of SynBa for the inference of combination dose-response data.
where 0 ¼ x i;1 x i;2 Á Á Á x i;n are the dosages for drug i, and d is a small non-zero value to avoid log C being undefined.

Uncertainty quantification of drug combination synergies i125
PðDHSA > 0jDÞ (9) can then be defined to estimate how confident we are about the 'synergistic efficacy' of the combination, based on the dataset D. It is possible to have a synergistic combination that is highly uncertain, which would indicate that more data are required to reach confidence in the estimation. For the synergistic potency, a contains the required information. a > 1 would indicate synergistic potency , which means the potency of the two drugs has reduced due to being combined. Consequently, we define Pða > 1jDÞ (10) to estimate how likely a combination satisfies 'synergistic potency'.

Case studies
To illustrate how the uncertainty estimation from our method can be explained and further used for decision-making, we take two combinations from the DREAM dataset as examples. Figure 4A shows the dose-response matrix for the combination of ADAM17 and AKT acted on the cell line BT-20. The first column is the monotherapy dose-response measurements for AKT (as the dosage for ADAM17 is zero), whilst the first row is the monotherapy measurements for ADAM17 (as the dosage for AKT is zero). It can be observed that the responses for AKT start to decrease at a higher rate when the dosage increases, but the dosage range is too small to understand its potency (IC 50 ) and efficacy (E inf ). The efficacy cannot be determined because the response has not shown any sign of convergence, whereas the potency cannot be determined because it relies on understanding the maximal response, which is itself uncertain. However, if a deterministic Hill equation is fitted to the monotherapy data, it will only provide a point estimate for the IC 50 and E inf , without acknowledging the above caveats. On the contrary, our method provides an uncertainty estimation for both quantities. As shown in Fig. 4B-C, the posterior distribution of IC 50 and E inf has large variances, which correspond to large uncertainty. In particular, as shown in Fig. 4C, the posterior of E inf for ADAM17 has a multimodal shape, which is sensible because it is unclear whether the dosage range is too small (which corresponds to the peak at 0), or the drug is simply ineffective regardless of the dosage (which corresponds to the peak at around 90).
Moving to the inference of the full combination matrix, most existing synergy methods have no means to showcase the uncertainty. Our method, on the contrary, provides the uncertainty around the synergistic potency and the synergistic efficacy, as shown in Fig. 4D-F. According to the model output, the combination is moderately likely to be synergistically potent (with a probability of 85.2%), but it is difficult to conclude its synergistic efficacy (with a probability of 62.9% to be synergistic efficacious). This is reasonable because the excessively small dosage range makes it difficult to conclude anything about efficacy with high certainty, but with the 25 available measurements on the plates where the two drugs have interacted, information can be extracted on whether combining the two drugs may lower the level of toxicity required to reach the same beneficial effect. This combination is an example where the model implies some potential in the synergy of the combination, but the level of uncertainty in the synergy is still high, which may require more measurements at larger dosages to be narrowed down.
We now consider the combination of AKT and EGFR acted on the cell line MDA-MB-468. Figure 5 shows its dose-response matrix and the inference result from our model for the monotherapies and the combination, respectively. All parameters and metrics of interest have low variances, representing low uncertainties. As shown in Fig. 5B and C, the dosages have suitable ranges and approximately follow the sigmoidal shapes of the expected dose-response fit, in particular for AKT. They contain sufficient information for the possibilities for IC 50 and E inf to be narrowed down. Similarly, the combination data are well-behaved. Figure 5C-E shows that the probabilities of this combination being synergistic in terms of potency and efficacy are both close to 100%. These are signs that this combination is worth being taken to subsequent steps in the drug development pipeline.
The two examples above show that concrete decisions can be made based on the posterior distributions (e.g. for IC 50 ; E inf ; DHSA, and a) from our model, and more importantly, the uncertainties associated with these distributions.   Zhang et al.

Training details
The models are trained by Stan, a state-of-the-art platform for statistical modelling and high-performance statistical computation, particularly for Bayesian computation (Stan Development Team 2023). The user specifies the prior model of the parameters and the likelihood model of the data, while Stan performs either full Bayesian statistical inference with Markov chain Monte Carlo (MCMC) sampling, or approximate Bayesian inference with variational inference. In this study, we opt for MCMC due to the importance of the reliability of the output, which is ensured by the asymptotic exactness of the MCMC inference. Despite choosing the slower option of the two, SynBa is still computationally efficient. Running on four CPUs of the Intel Xeon Platinum 8276 CPU Processor, the median time taken to fit SynBa (via MCMC with 1000 iterations and 4 chains, including 500 iterations in the warm-up phase) to a six-by-six dose-response matrix in DREAM is 10.2 s, which is comparable to MuSyC with bootstrap.
With this training pipeline, we can avoid the overhead that occurs during the usage of non-linear optimization packages in deterministic parametric methods such as MuSyC, BRAID, and the Effective Dose model. A different choice of the numerical algorithm (and its hyperparameters) results in a different result for those methods. On the contrary, in SynBa, the same exact result can be found asymptotically via MCMC with Stan.
For the implementation of the other benchmark methods including MuSyC, BRAID, and the Effective Dose model, the Python package synergy (Wooten and Albert 2021) is used.

Prediction of drug combination responses
In this subsection, we show that in addition to providing uncertainty estimations, SynBa is competitive in predicting unseen responses within a dose-response matrix, and is less prone to overfitting compared with the existing methods.
The datasets of interest are DREAM (Menden et al. 2019) and NCI-ALMANAC (Holbeck et al. 2017), two of the most widely used publicly available combination screenings. In DREAM, we focus on all examples in the training set of Challenge 1 that have passed the Quality Assurance and that only contain non-negative responses and one set of replicates, which are 1631 sets of combinations in total. In NCI-ALMANAC, we focus on the subset defined in Julkunen et al. (2020), which is a subset of the data consisting of 50 unique FDA-approved drugs and 36 120 combinations. We remove examples that contain negative measurements, which results in 28 854 remaining combinations.
We leave out 20% of the non-zero dosage combinations for prediction. The models are trained using the remaining 80% dosage combinations and then evaluated on the left-out points. For the dose-response matrices in DREAM, we leave out 7 of the 35 points with non-zero dosages from the six-bysix matrix (see Fig. 1A) using a specific leave-out strategy. For each of the two monotherapy slices, one point is left out for testing. For the five-by-five combination grid (i.e. the orange cells in Fig. 1A), five points are randomly left out for testing. Figure 1B shows an example of such a train-test split. The measurements are left out in this manner so that each monotherapy contains one measurement for testing.
For the dose-response matrices in NCI-ALMANAC, it is not possible to leave out points separately for monotherapies and for combinations because the data size is too small. For most combinations, there are only three points for each monotherapy and nine points for the interactions. Thus, we directly leave out 3 of the 15 points randomly for prediction.

Evaluation metrics
To evaluate the predictive performance of the models, test likelihood and the root-mean-square error (RMSE) are computed using the left-out points. The former focuses on the goodness of the full predictive distribution, whereas the latter focuses solely on the goodness of the point estimates for the responses.
The computation of the test likelihood for SynBa follows Equation (1) for both monotherapy and combination. However, the right-hand side of the equation cannot be computed in a closed form, so Monte Carlo estimation is required using the expression whereỹ is the predicted response of the left-out dosage, D is the training data containing the known dosages and the known responses, and h m $ pðhjDÞ are the MCMC samples from the posterior distribution for the parameters. For MuSyC, BRAID, and the Effective Dose model, the parametric bootstrap pipeline described in  is followed. Each bootstrapped dataset provides a fitted curve. The density forỹ is then estimated by averaging its density computed on the models learnt from the bootstrapped datasets.
The computation of RMSE is more straightforward. For each combination, its RMSE for the test responses fy 1 ; . . . ; y N g is whereŷ i is the point estimate for the response that corresponds to dosage x i . For SynBa, we defineŷ i to be the posterior predictive mean E½y i jD; x i , which is estimated by Monte Carlo sampling from the posterior.

Quantitative results
We compare our prediction results against MuSyC, BRAID, and the Effective Dose model, which are three of the most widely used synergy models. For SynBa, we implement both the uniform prior and the empirical Beta prior for the normalized E inf , which we denote as SynBa-U and SynBa-B, respectively. Table 1 show the mean and the median of the test loglikelihood and the test RMSE for MuSyC, BRAID, Effective Dose model, and SynBa. Our method outperforms all three other methods in all metrics except for the mean test loglikelihood on DREAM. In particular, our method performs well on RMSE, a metric that only considers the quality of point estimates and ignores uncertainty. The upper diagonal panels in Fig. 6 show the scatter plots directly comparing the test RMSE values between methods (visualized with the blue colour). Our method is the most competitive, as evidenced by having more points above the diagonal y ¼ x line. These show that our method is not trading off predictive accuracy for uncertainty estimation. By following a principled Bayesian workflow, our model is strong in both prediction and uncertainty estimation. It is worth noting that at least one of MuSyC, BRAID, or the Effective Dose model fail to find a solution for 4.8% of the examples in DREAM and 38.4% of them in NCI-ALMANAC, despite an effort in tuning the bounds, initial values, and hyperparameters involved in the optimization. Most likely this is because these methods rely on external optimization packages with no guaranteed convergence, which can become a problem when overparameterization becomes severe due to small data sizes. SynBa does not incur this problem since its priors ensure conservative outputs when data size is too small.
To investigate whether SynBa is prone to overfitting and how it compares to the other three methods, we perform the same prediction experiment on DREAM, but with a train-test split ratio of 40% : 60% instead. As shown in the lower diagonal panels in Fig. 6, the test RMSE values (visualized with the red colour) increase significantly for MuSyC, BRAID, and the Effective Dose model. For SynBa, however, the test RMSE values have increased on average, but not by much. It can be seen that the mean value and the spread increase more significantly for the other three methods compared with SynBa. It can also be observed that the predictive performance of SynBa is not sensitive to the choice of prior, with the two priors producing very close RMSE values to each other.
In addition, we compare SynBa with bayesynergy and Hand-GP, the two probabilistic synergy models. As shown in Supplementary Table S1, the test RMSE for bayesynergy and Hand-GP is significantly higher on both datasets. This could be because drug interactions are modelled non-parametrically in both methods, meaning a wide range of functions is represented and it is difficult to narrow down the probable doseresponse curves or surfaces when the data size is small or noisy.

Uncertainty calibration
For a model M with learnt cumulative distribution F M with well-calibrated uncertainty, it would approximately follow the identity that F Y ðx i Þ % F M ðx i Þ for every data point fðx i ; y i ðx i ÞÞji ¼ 1; . . . ; Ng in the dataset, where y i ðx i Þ is a sample from the unknown true cumulative distribution F Y ðx i Þ. Equivalently, assuming the measurements y i ðx i Þ for a combination are conditionally independent given the dosages x i , their cumulative probabilities Fðy i Þ :¼ Pðy i ðx i Þ < F M ðx i ÞÞ would be approximately uniformly distributed between 0 and 1, if M is well-calibrated.
In this study, for each combination in DREAM, we split the 35 measurements (excluding the base value) with a 80%:20% ratio in the same way as the prediction evaluation in the previous subsection. We then evaluate the quality of the uncertainty calibration with the Kolmogorov-Smirnov (K-S) uniformity test (Massey 1951) for the empirical cumulative probabilities across all test data points. If the model is wellcalibrated, then the cumulative probabilities for the test data points will be approximately uniform for each combination. Otherwise, they will show a non-uniform pattern and the resulting P-value for the K-S test will be statistically significant. This procedure is performed across every combination in DREAM, resulting in a P-value for each combination. The histogram of the P-values for SynBa is in Supplementary Fig.  S1A, showing that 6.07% of the combinations have not passed the uniformity test, and thus are not well-calibrated. As a comparison, the same procedure is performed using MuSyC. Supplementary Fig. S1B shows that 25.1% of the combinations are not well-calibrated when modelled by MuSyC, which is roughly four times as high as the number for SynBa. This shows the estimated uncertainty from SynBa is more reliable and closer to the unknown ground truth on average.

Discussion
Machine learning methods have been developed for preclinical modelling and the prediction of drug combinations, thanks to the availability of large screenings (Julkunen et al. 2020;Wang et al. 2021), which are beneficial for discovering and explaining drug combinations. However, a few factors prevent most from being applied to real-world drug discovery projects. One issue is that performance measures rely on synergy scores, which do not have a gold standard and contain a non-trivial amount of uncertainty, as discussed in Section 1. The Spearman correlation of the replicate experiments in DREAM (Menden et al. 2019) andO'Neil et al. (2016) are 0.56 and 0.63,respectively, which show that quantifying a combination with a single synergy score would result in high variance. However, uncertainty measurement is not included in synergy score estimation. This could be one of the reasons that 20% of drug combinations are poorly predicted by all methods in the DREAM challenge. Measuring the uncertainties associated with the estimated scores is important for the subsequent decision-making process based on the model outputs. In real-world scenarios, scientists are often facing the decision to choose among a large set of drug combinations that score similarly in terms of synergy. Without quantification of how certain (or uncertain) the estimated scores are, they will have to rely on background knowledge compromising innovation in their choices. SynBa provides a way to implement a ranking strategy in the decision process of a drug discovery pipeline, which is a real-world unmet need. SynBa has the limitation that it only models the combination of two drugs, while there exist methods such as the Effective Dose Model that consider higher-order combinations. While this is a limitation, our initial aim is to provide a method that is reliable and not over-parameterized, to meet with practical needs.

Conclusions
We have developed a new framework for quantifying doseresponse relationships for monotherapies and combinations that provides a full uncertainty estimation for all parameters that are associated with the monotherapies and the combinations, including information about efficacy, potency, and synergy. These uncertainty information would be helpful to the biologists to make further decisions about progressing to the next stages of the drug discovery pipeline, or whether more experiments are required to lower the level of the uncertainty and better understand the drug mechanism of action.
We have also shown that SynBa is competitive in predicting unseen responses within a given dose-response matrix, and outperforms MuSyC, BRAID, and the Effective Dose model on DREAM and NCI-ALMANAC. In addition, the prediction performance is not sensitive to the choice of the priors.
In summary, our framework is capable of providing a reliable uncertainty estimation for the potency (e.g. IC 50 ) and the efficacy (e.g. E inf ) of a monotherapy, or the synergistic potency and efficacy of a combination, in a decoupled manner, and reliably predicting unseen responses within a dose-response matrix. The parameter uncertainties can be interpreted and used as guidance for further experiments and subsequent decision making.